[SOUND] This lecture is about,
Opinion Mining and Sentiment Analysis,
covering, Motivation.
In this lecture,
we're going to start, talking about,
mining a different kind of knowledge.
Namely, knowledge about the observer or
humans that have generated the text data.
In particular, we're going to talk about
the opinion mining and sentiment analysis.
As we discussed earlier, text data
can be regarded as data generated
from humans as subjective sensors.
In contrast, we have other devices such
as video recorder that can report what's
happening in the real world objective to
generate the viewer data for example.
Now the main difference between test
data and other data, like video data,
is that it has rich opinions,
and the content tends to be subjective
because it's generated from humans.
Now, this is actually a unique advantaged
of text data, as compared with other data,
because the office is a great
opportunity to understand the observers.
We can mine text data to
understand their opinions.
Understand people's preferences,
how people think about something.
So this lecture and the following lectures
will be mainly about how we can mine and
analyze opinions buried
in a lot of text data.
So let's start with
the concept of opinion.
It's not that easy to
formally define opinion, but
mostly we would define
opinion as a subjective
statement describing what a person
believes or thinks about something.
Now, I highlighted quite a few words here.
And that's because it's worth thinking
a little bit more about these words.
And that will help us better
understand what's in an opinion.
And this further helps us to
define opinion more formally.
Which is always needed to computation to
resolve the problem of opinion mining.
So let's first look at the key
word of subjective here.
This is in contrast with objective
statement or factual statement.
Those statements can be proved right or
wrong.
And this is a key differentiating
factor from opinions
which tends to be not
easy to prove wrong or
right, because it reflects what
the person thinks about something.
So in contrast, objective statement can
usually be proved wrong or correct.
For example, you might say this
computer has a screen and a battery.
Now that's something you can check.
It's either having a battery or not.
But in contrast with this, think about
the sentence such as, this laptop has
the best battery or
this laptop has a nice screen.
Now these statements
are more subjective and
it's very hard to prove
whether it's wrong or correct.
So opinion, is a subjective statement.
And next lets look at
the keyword person here.
And that indicates that
is an opinion holder.
Because when we talk about opinion,
it's about an opinion held by someone.
And then we notice that
there is something here.
So that is the target of the opinion.
The opinion is expressed
on this something.
And now, of course, believes or
thinks implies that
an opinion will depend on the culture or
background and the context in general.
Because a person might think
different in a different context.
People from different background
may also think in different ways.
So this analysis shows that there are
multiple elements that we need to include
in order to characterize opinion.
So, what's a basic opinion
representation like?
Well, it should include at
least three elements, right?
Firstly, it has to specify
what's the opinion holder.
So whose opinion is this?
Second, it must also specify the target,
what's this opinion about?
And third, of course,
we want opinion content.
And so what exactly is opinion?
If you can identify these,
we get a basic understanding of opinion
and can already be useful sometimes.
You want to understand further,
we want enriched opinion representation.
And that means we also want to
understand that, for example,
the context of the opinion and
what situation was the opinion expressed.
For example, what time was it expressed?
We, also, would like to, people understand
the opinion sentiment, and this is
to understand that what the opinion tells
us about the opinion holder's feeling.
For example, is this opinion positive,
or negative?
Or perhaps the opinion holder was happy or
was sad, and
so such understanding obvious
to those beyond just Extracting
the opinion content,
it needs some analysis.
So let's take a simple
example of a product review.
In this case, this actually expressed the
opinion holder, and expressed the target.
So its obviously whats opinion holder and
that's just reviewer and its also often
very clear whats the opinion target and
that's the product review for
example iPhone 6.
When the review is posted usually
you can't such information easier.
Now the content, of course,
is a review text that's, in general,
also easy to obtain.
So you can see product reviews are fairly
easy to analyze in terms of obtaining
a basic opinion of representation.
But of course, if you want to get more
information, you might know the Context,
for example.
The review was written in 2015.
Or, we want to know that the sentiment
of this review is positive.
So, this additional understanding of
course adds value to mining the opinions.
Now, you can see in this case the task
is relatively easy and that's
because the opinion holder and the opinion
target have already been identified.
Now let's take a look at
the sentence in the news.
In this case, we have a implicit
holder and a implicit target.
And the tasker is in general harder.
So, we can identify opinion holder here,
and that's the governor of Connecticut.
We can also identify the target.
So one target is Hurricane Sandy, but
there is also another target
mentioned which is hurricane of 1938.
So what's the opinion?
Well, there's a negative sentiment here
that's indicated by words like bad and
worst.
And we can also, then, identify context,
New England in this case.
Now, unlike in the playoff review,
all these elements must be extracted by
using natural RAM processing techniques.
So, the task Is much harder.
And we need a deeper natural
language processing.
And these examples also
suggest that a lot of work can be
easy to done for product reviews.
That's indeed what has happened.
Analyzing and
assembling news is still quite difficult,
it's more difficult than the analysis
of opinions in product reviews.
Now there are also some other
interesting variations.
In fact, here we're going to
examine the variations of opinions,
more systematically.
First, let's think about
the opinion holder.
The holder could be an individual or
it could be group of people.
Sometimes, the opinion
was from a committee.
Or from a whole country of people.
Opinion target accounts will vary a lot.
It can be about one entity,
a particular person, a particular product,
a particular policy, ect.
But it could be about a group of products.
Could be about the products
from a company in general.
Could also be very specific
about one attribute, though.
An attribute of the entity.
For example,
it's just about the battery of iPhone.
It could be someone else's opinion.
And one person might comment on
another person's Opinion, etc.
So, you can see there is a lot of
variation here that will cause
the problem to vary a lot.
Now, opinion content, of course,
can also vary a lot on the surface,
you can identify one-sentence opinion or
one-phrase opinion.
But you can also have longer
text to express an opinion,
like the whole article.
And furthermore we identify
the variation in the sentiment or
emotion damage that's above
the feeding of the opinion holder.
So, we can distinguish a positive
versus negative or mutual or
happy versus sad, separate.
Finally, the opinion
context can also vary.
We can have a simple context, like
different time or different locations.
But there could be also complex contexts,
such as some background
of topic being discussed.
So when opinion is expressed in
particular discourse context, it has to
be interpreted in different ways than
when it's expressed in another context.
So the context can be very [INAUDIBLE] to
entire discourse context of the opinion.
From computational perspective,
we're mostly interested in what opinions
can be extracted from text data.
So, it turns out that we can
also differentiate, distinguish,
different kinds of opinions in text
data from computation perspective.
First, the observer might make
a comment about opinion targeting,
observe the word So
in case we have the author's opinion.
For example,
I don't like this phone at all.
And that's an opinion of this author.
In contrast, the text might also
report opinions about others.
So the person could also Make observation
about another person's opinion and
reported this opinion.
So for example,
I believe he loves the painting.
And that opinion is really about the It is
really expressed by another person here.
So, it doesn't mean this
author loves that painting.
So clearly, the two kinds of opinions
need to be analyzed in different ways,
and sometimes in product reviews,
you can see, although mostly the opinions
are false from this reviewer.
Sometimes, a reviewer might mention
opinions of his friend or her friend.
Another complication is that
there may be indirect opinions or
inferred opinions that can be obtained.
By making inferences on
what's expressed in the text that might
not necessarily look like opinion.
For example, one statement that might be,
this phone ran out of
battery in just one hour.
Now, this is in a way a factual statement
because It's either true or false, right?
You can even verify that,
but from this statement,
one can also infer some negative opinions
about the quality of the battery of
this phone, or the feeling of
the opinion holder about the battery.
The opinion holder clearly wished
that the battery do last longer.
So these are interesting variations
that we need to pay attention to when we
extract opinions.
Also, for
this reason about indirect opinions,
it's often also very useful to extract
whatever the person has said about
the product, and sometimes factual
sentences like these are also very useful.
So, from a practical viewpoint,
sometimes we don't necessarily
extract the subject of sentences.
Instead, again, all the sentences that
are about the opinions are useful for
understanding the person or
understanding the product that we commend.
So the task of opinion mining can be
defined as taking textualized input
to generate a set of
opinion representations.
Each representation we should
identify opinion holder,
target, content, and the context.
Ideally we can also infer opinion
sentiment from the comment and
the context to better understand.
The opinion.
Now often, some elements of
the representation are already known.
I just gave a good example in
the case of product we'd use
where the opinion holder and the opinion
target are often expressly identified.
And that's not why this turns out to be
one of the simplest opinion mining tasks.
Now, it's interesting to think about
the other tasks that might be also simple.
Because those are the cases
where you can easily build
applications by using
opinion mining techniques.
So now that we have talked about what is
opinion mining, we have defined the task.
Let's also just talk a little bit about
why opinion mining is very important and
why it's very useful.
So here, I identify three major reasons,
three broad reasons.
The first is it can help decision support.
It can help us optimize our decisions.
We often look at other people's opinions,
look at read the reviews
in order to make a decisions like
buying a product or using a service.
We also would be interested
in others opinions
when we decide whom to vote for example.
And policy makers,
may also want to know people's
opinions when designing a new policy.
So that's one general,
kind of, applications.
And it's very broad, of course.
The second application is to understand
people, and this is also very important.
For example, it could help
understand people's preferences.
And this could help us
better serve people.
For example, we optimize a product search
engine or optimize a recommender system
if we know what people are interested in,
what people think about product.
It can also help with advertising,
of course, and we can have targeted
advertising if we know what kind of
people tend to like what kind of plot.
Now the third kind of application
can be called voluntary survey.
Now this is most important research
that used to be done by doing surveys,
doing manual surveys.
Question, answer it.
People need to feel informs
to answer their questions.
Now this is directly related to humans
as sensors, and we can usually aggregate
opinions from a lot of humans through
kind of assess the general opinion.
Now this would be very useful for
business intelligence where manufacturers
want to know where their products
have advantages over others.
What are the winning
features of their products,
winning features of competitive products.
Market research has to do with
understanding consumers oppinions.
And this create very useful directive for
that.
Data-driven social science research
can benefit from this because they can
do text mining to understand
the people's opinions.
And if you can aggregate a lot of opinions
from social media, from a lot of, popular
information then you can actually
do some study of some questions.
For example, we can study the behavior of
people on social media on social networks.
And these can be regarded as voluntary
survey done by those people.
In general, we can gain a lot of advantage
in any prediction task because we can
leverage the text data as
extra data above any problem.
And so we can use text based
prediction techniques to help you
make predictions or
improve the accuracy of prediction.
[MUSIC]

